-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Perf improvements to collections::BitSet
.
#25230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see CONTRIBUTING.md for more information. |
This looks like a breaking change – some stable items have different signatures. |
FWIW, |
@Stebalien, yes you are right, only some methods are declared stable, but not the Struct or the module itself. I meant stuff like this, by the way: #[stable(feature = "rust1", since = "1.0.0")]
-pub struct Union<'a>(TwoBitPositions<'a>);
+pub struct Union<'a>(BlockIter<TwoBitPositions<'a>>); |
The signature is |
Yes. You are right again. Just ignore me, I'm not in a Rusty frame of mind :) |
@@ -1451,8 +1451,8 @@ impl BitSet { | |||
/// ``` | |||
#[inline] | |||
#[stable(feature = "rust1", since = "1.0.0")] | |||
pub fn iter(&self) -> bit_set::Iter { | |||
SetIter {set: self, next_idx: 0} | |||
pub fn iter<'a>(&'a self) -> bit_set::Iter<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These annotations seem strictly redundant?
Great wins! Basically lgtm modulo minor style nits. |
r? @gankro (moving reviewers) |
Thanks for the comments, I'll try to address them soon. I'm currently stuck in an airport 6000 miles from home so it will take a few days to get this up to snuff! |
@rayglover Ok, no worries! |
I've fixed the oddness in the |
r? @alexcrichton (want decision on Octal/Hex is Debugish vs Displayish) |
r? @gankro wrong PR |
@bors r+ Thanks! |
📌 Commit 307fab1 has been approved by |
Some modest running-time improvements to `std::collections::BitSet` on bit-sets of varying set-membership densities. This is work originally from [here](https://github.com/rayglover/alt_collections). (Benchmarks copied below) ``` std::collections::BitSet / alt_collections::BitSet copy_dense ... 3.08x copy_sparse ... 4.22x count_dense ... 11.01x count_sparse ... 8.11x from_bytes ... 1.47x intersect_dense ... 6.54x intersect_sparse ... 4.37x union_dense ... 5.53x union_sparse ... 5.60x ``` The exception is `from_bytes`, which I've left unaltered since the optimization is rather obscure. Compiling with the cpu feature `popcnt` gave a further ~10% improvement on my machine, but this wasn't factored in to the benchmarks above. Similar improvements could be made to `BitVec`, although that would probably require more substantial changes. criticism welcome!
} | ||
|
||
return None; | ||
// from the current block, isolate the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be more efficient to use http://doc.rust-lang.org/nightly/std/primitive.u32.html#method.trailing_zeros to get this index. (At least, I'd hope that LLVM lowers that instruction to the most efficient thing for the given platform.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huonw Ahh, you're probably right in the general case because the particular opcode (BSF
) is more widely supported, whereas POPCNT
needs to be explicitly enabled..
It would also remove the additional bit-twiddling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I would assume that using the direct method (which actually calls an LLVM intrinsic) would always be at least as fast as this method: if it isn't, then it's an LLVM bug (since LLVM can implement the intrinsic via this method on platforms where this is the fastest).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why it would be an LLVM bug; llvm.cttz
will generally result in a BSF
instruction on most hardware. That's fine, except that BSF
is comparatively very slow, several multiples slower than POPCNT
(the instruction behind llvm.ctpop
) on the same hardware.
However, BSF
is more widely supported, and where POPCNT
isn't available (or isn't enabled) then the resulting assembly is an additional ~12 instructions on 64-bit hardware (I don't know about 32-bit).
A good review of this algorithm (and the various ways of implementing it) can be found here. The fastest popcnt
-based variant is consistently faster (~10-25%), although in the rust implementation the difference is less pronounced due to constant factors being more dominant.
As I said, std
isn't normally compiled with the relevant cpu feature enabled for most of my points to be relevant, so it seems to make more sense to use your suggestion, which I've made here.
Some modest running-time improvements to
std::collections::BitSet
on bit-sets of varying set-membership densities. This is work originally from here. (Benchmarks copied below)The exception is
from_bytes
, which I've left unaltered since the optimization is rather obscure.Compiling with the cpu feature
popcnt
gave a further ~10% improvement on my machine, but this wasn't factored in to the benchmarks above.Similar improvements could be made to
BitVec
, although that would probably require more substantial changes.criticism welcome!